Data Augmentation Using Multi-Input Multi-Output Source Separation for Deep Neural Network Based Acoustic Modeling
نویسندگان
چکیده
We investigate the use of local Gaussian modeling (LGM) based source separation to improve speech recognition accuracy. Previous studies have shown that the LGM based source separation technique has been successfully applied to the runtime speech enhancement and the speech enhancement of training data for deep neural network (DNN) based acoustic modeling. In this paper, we propose a data augmentation method utilizing the multi-input multi-output (MIMO) characteristic of LGM based source separation. We first investigate the difference between unprocessed multi-microphone signals and multi-channel output signals from LGM based source separation as augmented training data for DNN based acoustic modeling. Experimental results using the third CHiME challenge dataset show that the proposed data augmentation outperforms the conventional data augmentation. In addition, we experiment the beamforming applied to the source separated signals as runtime speech enhancement. The results show that the proposed runtime beamforming further improves the speech recognition accuracy.
منابع مشابه
Modeling heat transfer of non-Newtonian nanofluids using hybrid ANN-Metaheuristic optimization algorithm
An optimal artificial neural network (ANN) has been developed to predict the Nusselt number of non-Newtonian nanofluids. The resulting ANN is a multi-layer perceptron with two hidden layers consisting of six and nine neurons, respectively. The tangent sigmoid transfer function is the best for both hidden layers and the linear transfer function is the best transfer function for the output layer....
متن کاملDevelopment of an in-cylinder processes model of a CVVT gasoline engine using artificial neural network
Today, employing model based design approach in powertrain development is being paid more attention. Precise, meanwhile fast to run models are required for applying model based techniques in powertrain control design and engine calibration. In this paper, an in-cylinder process model of a CVVT gasoline engine is developed to be employed in extended mean valve control oriented model and also mod...
متن کاملPersian Phone Recognition Using Acoustic Landmarks and Neural Network-based variability compensation methods
Speech recognition is a subfield of artificial intelligence that develops technologies to convert speech utterance into transcription. So far, various methods such as hidden Markov models and artificial neural networks have been used to develop speech recognition systems. In most of these systems, the speech signal frames are processed uniformly, while the information is not evenly distributed ...
متن کاملRejection of the Feed-Flow Disturbances in a Multi-Component Distillation Column Using a Multiple Neural Network Model-Predictive Controller
This article deals with the issues associated with developing a new design methodology for the nonlinear model-predictive control (MPC) of a chemical plant. A combination of multiple neural networks is selected and used to model a nonlinear multi-input multi-output (MIMO) process with time delays. An optimization procedure for a neural MPC algorithm based on this model is then developed. T...
متن کاملAcoustic scene classification using convolutional neural network and multiple-width frequency-delta data augmentation
In recent years, neural network approaches have shown superior performance to conventional hand-made features in numerous application areas. In particular, convolutional neural networks (ConvNets) exploit spatially local correlations across input data to improve the performance of audio processing tasks, such as speech recognition, musical chord recognition, and onset detection. Here we apply C...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016